Skip to content

fix(fiber): split DA Submit at Fibre's 128 MiB upload cap + duration log#3307

Merged
julienrbrt merged 3 commits intoevstack:julien/fiberfrom
walldiss:pr3-fibre-128mib-cap
May 3, 2026
Merged

fix(fiber): split DA Submit at Fibre's 128 MiB upload cap + duration log#3307
julienrbrt merged 3 commits intoevstack:julien/fiberfrom
walldiss:pr3-fibre-128mib-cap

Conversation

@walldiss
Copy link
Copy Markdown

@walldiss walldiss commented May 2, 2026

Issue

Under sustained txsim load the DA submitter batched up to 10 pending data items into a single Upload() call, producing a flat payload of ~144 MiB. Fibre's per-upload server-side cap is hard at ~128 MiB (blob size exceeds maximum allowed size: data size 144366912 exceeds maximum 134217723) and rejected every batched upload. With MaxPendingHeadersAndData=10 that took down 170 consecutive submissions before the daemon halted itself with Data exceeds DA blob size limit.

The Submit path also had no per-call observability — failures showed up as DeadlineExceeded or oversized blob after the fact, with no measurement of how long uploads actually took. During load-test debugging this turned into a guessing game over whether RPCTimeout, pending cap, or batch sizing was the right knob to turn next.

Solution

  • fiberDAClient.Submit: wrap the fiber.Upload call in a chunker (chunkBlobsForFibre) that groups input blobs into ≤120 MiB chunks (8 MiB headroom under Fibre's 128 MiB cap for flattenBlobs's per-blob length-prefix overhead) and uploads each chunk separately. Aggregates submitted counts and BlobIDs across chunks; on first chunk failure, returns the error with the partially-submitted count so the submitter's retry/backoff sees a coherent state.
  • Per-Submit upload-duration log (info on success, warn on failure): duration, flat blob bytes, blob count, chunk index. Cheap (one time.Since) and gives the operator concrete numbers — e.g. 17 blobs / 115 MiB / 1.5 s — to reason about whether the upload pipeline or something downstream is the bottleneck.
  • evnode-fibre: block.SetMaxBlobSize(120 → 100 MiB). Companion safety: after the chunker splits a multi-blob batch, a single oversized blob would still end up alone in its own chunk and fail server-side. Capping per-block data at 100 MiB ensures even a single block_data item fits in one Fibre upload.

Test plan

  • No more data size N exceeds maximum 134217723 rejections under sustained load
  • No more single item exceeds DA blob size limit halts
  • Per-submit upload duration line in evnode logs

walldiss added 3 commits May 3, 2026 01:42
The Fibre Submit path was opaque: failures showed up as
DeadlineExceeded with no signal of how long the upload
actually took, and successes only logged at debug level
inside the upstream library. During load-test debugging
this turned into a guessing game — was the cluster slow,
the deadline too tight, or something stuck mid-RPC?

Add a single info-level (warn-on-failure) log line in
fiberDAClient.Submit covering the Upload call: duration,
flat blob bytes, blob count. Cheap (one time.Since) and
gives the operator concrete numbers — e.g. "17 blobs / 115
MiB / 1.5 s" — to reason about whether RPCTimeout, pending
cap, or batch sizing is the right knob to turn next.
Under sustained txsim load (~50 MiB/s) the DA submitter
batched 10 block_data items into one Upload(), producing a
flat payload of 144 MiB. Fibre's per-upload cap is hard at
~128 MiB ("blob size exceeds maximum allowed size: data
size 144366912 exceeds maximum 134217723") and rejected
every batched upload. With MaxPendingHeadersAndData=10
that took down 170 consecutive submissions before the
node halted itself with "Data exceeds DA blob size limit".

Wrap the Upload call in a chunker that groups input blobs
into ≤120 MiB chunks (8 MiB headroom under Fibre's cap for
the per-blob length-prefix overhead added by flattenBlobs)
and uploads each chunk separately. Aggregates submitted
counts and BlobIDs across chunks; on first chunk failure,
returns the error with the partially-submitted count so
the submitter's retry/backoff logic sees a coherent state
instead of all-or-nothing.

Single oversized blobs (already validated against
DefaultMaxBlobSize earlier in Submit) still land alone and
fail server-side, but at least don't drag healthy peers
into the same rejected batch.
Companion to the submitter chunking fix. The submitter can
split a multi-blob batch into ≤120 MiB Fibre uploads, but
a *single* block_data item that exceeds 128 MiB still ends
up alone in its own chunk and fails server-side ("blob size
exceeds maximum allowed size"). Lower the per-block cap to
100 MiB so under high-throughput txsim a single block can't
grow past Fibre's hard limit, and update the comment to
explain the relationship between this cap and Fibre's
~128 MiB upload reject threshold.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 2, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1a377a03-cfee-4c25-90b7-345b911eb2bc

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@julienrbrt julienrbrt merged commit d5f981c into evstack:julien/fiber May 3, 2026
18 of 25 checks passed
// Fibre Upload call. Fibre rejects payloads above ~128 MiB
// ("data size N exceeds maximum 134217723"); 120 MiB leaves slack for
// flattenBlobs's per-blob length prefixes and for any future overhead.
const fibreUploadChunkBudget = 120 * 1024 * 1024
Copy link
Copy Markdown
Member

@julienrbrt julienrbrt May 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, https://github.com/evstack/ev-node/blob/main/block/internal/common/consts.go#L11-L12 would be pre-prefixes 120mb.
Nice catch!

// Set the per-block data cap below that so each block_data item
// fits in a single Fibre upload after the submitter splits a
// multi-blob batch into ≤120 MiB chunks.
block.SetMaxBlobSize(100 * 1024 * 1024)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏾 can we update the ldflags here for consistency: https://github.com/celestiaorg/x402-risotto/blob/main/scripts/run-stack.sh#L61 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants